Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs

نویسنده

Theologos Bountourelis

چکیده

This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer’s “indifference-zone” approach for the Ranking & Selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

متن کامل

Efficient schedules for the problem of optimal node visitation in acyclic stochastic digraphs

Given a stochastic, acyclic, connected digraph with a single source node and a control agent that repetitively traverses this graph, each time starting from the source node, we want to define a control policy that will enable this agent to visit each of the graph terminal nodes a prespecified number of times, while minimizing the expected number of the graph traversals. We formulate this proble...

متن کامل

Arrival probability in the stochastic networks with an established discrete time Markov chain

The probable lack of some arcs and nodes in the stochastic networks is considered in this paper, and its effect is shown as the arrival probability from a given source node to a given sink node. A discrete time Markov chain with an absorbing state is established in a directed acyclic network. Then, the probability of transition from the initial state to the absorbing state is computed. It is as...

متن کامل

Longest Path in Networks of Queues in the Steady-State

Due to the importance of longest path analysis in networks of queues, we develop an analytical method for computing the steady-state distribution function of longest path in acyclic networks of queues. We assume the network consists of a number of queuing systems and each one has either one or infinite servers. The distribution function of service time is assumed to be exponential or Erlang. Fu...

متن کامل

Learning Evaluation Functions for Large Acyclic Domains

Some of the most successful recent applications of reinforcement learning have used neural networks and the TD( ) algorithm to learn evaluation functions. In this paper, we examine the intuition that TD( ) operates by approximating asynchronous value iteration. We note that on the important subclass of acyclic tasks, value iteration is ine cient compared with another graph algorithm, DAG-SP, wh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Discrete Event Dynamic Systems

دوره 17 شماره

صفحات -

تاریخ انتشار 2007

Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs

نویسنده

چکیده

منابع مشابه

Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Efficient schedules for the problem of optimal node visitation in acyclic stochastic digraphs

Arrival probability in the stochastic networks with an established discrete time Markov chain

Longest Path in Networks of Queues in the Steady-State

Learning Evaluation Functions for Large Acyclic Domains

عنوان ژورنال:

اشتراک گذاری